Overview

Brought to you by YData

Dataset statistics

Number of variables13
Number of observations3407
Missing cells0
Missing cells (%)0.0%
Duplicate rows0
Duplicate rows (%)0.0%
Total size in memory532.4 KiB
Average record size in memory160.0 B

Variable types

Numeric5
Categorical7
DateTime1

Alerts

STATE has a high cardinality: 52 distinct values High cardinality
HCP_SPECIALTY has a high cardinality: 78 distinct values High cardinality
HCP_GENDER is highly overall correlated with PATIENT_IDHigh correlation
HCP_SPECIALTY is highly overall correlated with PATIENT_IDHigh correlation
INSURANCE_TYPE is highly overall correlated with PATIENT_IDHigh correlation
NUM_CONDITIONS is highly overall correlated with PATIENT_AGE_DIAGNOSEDHigh correlation
PATIENT_AGE_DIAGNOSED is highly overall correlated with NUM_CONDITIONSHigh correlation
PATIENT_GENDER is highly overall correlated with PATIENT_IDHigh correlation
PATIENT_ID is highly overall correlated with HCP_GENDER and 6 other fieldsHigh correlation
STATE is highly overall correlated with PATIENT_IDHigh correlation
TARGET is highly overall correlated with PATIENT_IDHigh correlation
TXN_LOCATION_TYPE is highly overall correlated with PATIENT_IDHigh correlation
INSURANCE_TYPE is highly imbalanced (58.0%) Imbalance
PATIENT_ID is uniformly distributed Uniform
PATIENT_ID has unique values Unique
NUM_CONTRAINDICATIONS has 2434 (71.4%) zeros Zeros
PATIENT_AGE_DIAGNOSED has 38 (1.1%) zeros Zeros

Reproduction

Analysis started2024-12-17 00:16:38.017140
Analysis finished2024-12-17 00:17:52.784577
Duration1 minute and 14.77 seconds
Software versionydata-profiling vv4.12.1
Download configurationconfig.json

Variables

PATIENT_ID
Real number (ℝ)

High correlation  Uniform  Unique 

Distinct3407
Distinct (%)100.0%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean2003.7831
Minimum1
Maximum4020
Zeros0
Zeros (%)0.0%
Negative0
Negative (%)0.0%
Memory size26.7 KiB
2024-12-16T18:17:52.816904image/svg+xmlMatplotlib v3.9.4, https://matplotlib.org/

Quantile statistics

Minimum1
5-th percentile199.3
Q11000
median1997
Q33006.5
95-th percentile3821.7
Maximum4020
Range4019
Interquartile range (IQR)2006.5

Descriptive statistics

Standard deviation1160.9316
Coefficient of variation (CV)0.5793699
Kurtosis-1.1992073
Mean2003.7831
Median Absolute Deviation (MAD)1003
Skewness0.0067949401
Sum6826889
Variance1347762.2
MonotonicityNot monotonic
2024-12-16T18:17:52.874060image/svg+xmlMatplotlib v3.9.4, https://matplotlib.org/
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%)
1 1
 
< 0.1%
1889 1
 
< 0.1%
1892 1
 
< 0.1%
1893 1
 
< 0.1%
1894 1
 
< 0.1%
1895 1
 
< 0.1%
1896 1
 
< 0.1%
1897 1
 
< 0.1%
1898 1
 
< 0.1%
1899 1
 
< 0.1%
Other values (3397) 3397
99.7%
ValueCountFrequency (%)
1 1
< 0.1%
2 1
< 0.1%
3 1
< 0.1%
4 1
< 0.1%
7 1
< 0.1%
8 1
< 0.1%
9 1
< 0.1%
10 1
< 0.1%
11 1
< 0.1%
12 1
< 0.1%
ValueCountFrequency (%)
4020 1
< 0.1%
4019 1
< 0.1%
4017 1
< 0.1%
4016 1
< 0.1%
4015 1
< 0.1%
4012 1
< 0.1%
4010 1
< 0.1%
4009 1
< 0.1%
4008 1
< 0.1%
4007 1
< 0.1%

PATIENT_GENDER
Categorical

High correlation 

Distinct2
Distinct (%)0.1%
Missing0
Missing (%)0.0%
Memory size3.7 KiB
F-Female
1942 
M-Male
1465 

Length

Max length8
Median length8
Mean length7.1400059
Min length6

Characters and Unicode

Total characters24326
Distinct characters7
Distinct categories1 ?
Distinct scripts1 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st rowM-Male
2nd rowM-Male
3rd rowM-Male
4th rowM-Male
5th rowM-Male

Common Values

ValueCountFrequency (%)
F-Female 1942
57.0%
M-Male 1465
43.0%

Length

2024-12-16T18:17:52.928094image/svg+xmlMatplotlib v3.9.4, https://matplotlib.org/
Histogram of lengths of the category

Common Values (Plot)

2024-12-16T18:17:52.965633image/svg+xmlMatplotlib v3.9.4, https://matplotlib.org/
ValueCountFrequency (%)
f-female 1942
57.0%
m-male 1465
43.0%

Most occurring characters

ValueCountFrequency (%)
e 5349
22.0%
F 3884
16.0%
- 3407
14.0%
a 3407
14.0%
l 3407
14.0%
M 2930
12.0%
m 1942
 
8.0%

Most occurring categories

ValueCountFrequency (%)
(unknown) 24326
100.0%

Most frequent character per category

(unknown)
ValueCountFrequency (%)
e 5349
22.0%
F 3884
16.0%
- 3407
14.0%
a 3407
14.0%
l 3407
14.0%
M 2930
12.0%
m 1942
 
8.0%

Most occurring scripts

ValueCountFrequency (%)
(unknown) 24326
100.0%

Most frequent character per script

(unknown)
ValueCountFrequency (%)
e 5349
22.0%
F 3884
16.0%
- 3407
14.0%
a 3407
14.0%
l 3407
14.0%
M 2930
12.0%
m 1942
 
8.0%

Most occurring blocks

ValueCountFrequency (%)
(unknown) 24326
100.0%

Most frequent character per block

(unknown)
ValueCountFrequency (%)
e 5349
22.0%
F 3884
16.0%
- 3407
14.0%
a 3407
14.0%
l 3407
14.0%
M 2930
12.0%
m 1942
 
8.0%

NUM_CONDITIONS
Real number (ℝ)

High correlation 

Distinct159
Distinct (%)4.7%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean18.632228
Minimum1
Maximum189
Zeros0
Zeros (%)0.0%
Negative0
Negative (%)0.0%
Memory size30.1 KiB
2024-12-16T18:17:53.006053image/svg+xmlMatplotlib v3.9.4, https://matplotlib.org/

Quantile statistics

Minimum1
5-th percentile1
Q12
median5
Q322
95-th percentile85.7
Maximum189
Range188
Interquartile range (IQR)20

Descriptive statistics

Standard deviation29.761452
Coefficient of variation (CV)1.5973105
Kurtosis7.2789425
Mean18.632228
Median Absolute Deviation (MAD)4
Skewness2.600783
Sum63480
Variance885.74403
MonotonicityNot monotonic
2024-12-16T18:17:53.055805image/svg+xmlMatplotlib v3.9.4, https://matplotlib.org/
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%)
1 841
24.7%
2 363
 
10.7%
3 221
 
6.5%
4 160
 
4.7%
5 122
 
3.6%
6 95
 
2.8%
7 89
 
2.6%
8 84
 
2.5%
10 77
 
2.3%
9 63
 
1.8%
Other values (149) 1292
37.9%
ValueCountFrequency (%)
1 841
24.7%
2 363
10.7%
3 221
 
6.5%
4 160
 
4.7%
5 122
 
3.6%
6 95
 
2.8%
7 89
 
2.6%
8 84
 
2.5%
9 63
 
1.8%
10 77
 
2.3%
ValueCountFrequency (%)
189 1
< 0.1%
186 1
< 0.1%
180 1
< 0.1%
179 1
< 0.1%
177 1
< 0.1%
175 1
< 0.1%
174 1
< 0.1%
171 1
< 0.1%
170 2
0.1%
165 2
0.1%

NUM_CONTRAINDICATIONS
Real number (ℝ)

Zeros 

Distinct79
Distinct (%)2.3%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean3.8062812
Minimum0
Maximum360
Zeros2434
Zeros (%)71.4%
Negative0
Negative (%)0.0%
Memory size30.1 KiB
2024-12-16T18:17:53.104228image/svg+xmlMatplotlib v3.9.4, https://matplotlib.org/

Quantile statistics

Minimum0
5-th percentile0
Q10
median0
Q31
95-th percentile24
Maximum360
Range360
Interquartile range (IQR)1

Descriptive statistics

Standard deviation12.525443
Coefficient of variation (CV)3.2907297
Kurtosis213.12078
Mean3.8062812
Median Absolute Deviation (MAD)0
Skewness10.035368
Sum12968
Variance156.88671
MonotonicityNot monotonic
2024-12-16T18:17:53.154327image/svg+xmlMatplotlib v3.9.4, https://matplotlib.org/
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%)
0 2434
71.4%
1 228
 
6.7%
2 117
 
3.4%
3 72
 
2.1%
4 42
 
1.2%
5 37
 
1.1%
6 27
 
0.8%
8 26
 
0.8%
7 23
 
0.7%
11 19
 
0.6%
Other values (69) 382
 
11.2%
ValueCountFrequency (%)
0 2434
71.4%
1 228
 
6.7%
2 117
 
3.4%
3 72
 
2.1%
4 42
 
1.2%
5 37
 
1.1%
6 27
 
0.8%
7 23
 
0.7%
8 26
 
0.8%
9 19
 
0.6%
ValueCountFrequency (%)
360 1
< 0.1%
184 1
< 0.1%
99 1
< 0.1%
97 1
< 0.1%
86 1
< 0.1%
81 1
< 0.1%
79 2
0.1%
77 1
< 0.1%
76 1
< 0.1%
74 1
< 0.1%

TXN_DT
Date

Distinct72
Distinct (%)2.1%
Missing0
Missing (%)0.0%
Memory size26.7 KiB
Minimum2022-04-01 00:00:00
Maximum2022-06-30 00:00:00
Invalid dates0
Invalid dates (%)0.0%
2024-12-16T18:17:53.204227image/svg+xmlMatplotlib v3.9.4, https://matplotlib.org/
2024-12-16T18:17:53.256296image/svg+xmlMatplotlib v3.9.4, https://matplotlib.org/
Histogram with fixed size bins (bins=50)

HCP_ID
Real number (ℝ)

Distinct3283
Distinct (%)96.4%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean11103.371
Minimum2
Maximum25342
Zeros0
Zeros (%)0.0%
Negative0
Negative (%)0.0%
Memory size26.7 KiB
2024-12-16T18:17:53.308796image/svg+xmlMatplotlib v3.9.4, https://matplotlib.org/

Quantile statistics

Minimum2
5-th percentile553.3
Q14041.5
median10895
Q317573
95-th percentile23491.4
Maximum25342
Range25340
Interquartile range (IQR)13531.5

Descriptive statistics

Standard deviation7458.8617
Coefficient of variation (CV)0.67176549
Kurtosis-1.223735
Mean11103.371
Median Absolute Deviation (MAD)6740
Skewness0.1424455
Sum37829186
Variance55634618
MonotonicityNot monotonic
2024-12-16T18:17:53.360918image/svg+xmlMatplotlib v3.9.4, https://matplotlib.org/
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%)
10723 6
 
0.2%
18825 4
 
0.1%
24319 4
 
0.1%
24126 4
 
0.1%
13216 3
 
0.1%
9144 3
 
0.1%
18851 3
 
0.1%
22069 3
 
0.1%
2112 3
 
0.1%
8971 3
 
0.1%
Other values (3273) 3371
98.9%
ValueCountFrequency (%)
2 1
< 0.1%
4 2
0.1%
5 1
< 0.1%
6 1
< 0.1%
7 1
< 0.1%
8 1
< 0.1%
11 1
< 0.1%
13 1
< 0.1%
14 1
< 0.1%
19 1
< 0.1%
ValueCountFrequency (%)
25342 1
< 0.1%
25320 1
< 0.1%
25185 1
< 0.1%
25180 1
< 0.1%
25171 2
0.1%
25158 1
< 0.1%
25155 1
< 0.1%
25149 1
< 0.1%
25133 1
< 0.1%
25122 1
< 0.1%

TXN_LOCATION_TYPE
Categorical

High correlation 

Distinct26
Distinct (%)0.8%
Missing0
Missing (%)0.0%
Memory size6.5 KiB
OFFICE
1055 
EMERGENCY ROOM - HOSPITAL
555 
URGENT CARE FACILITY
541 
TELEHEALTH PROVIDED OTHER THAN IN PATIENT'S HOME
380 
HOSPITAL OUTPATIENT
337 
Other values (21)
539 

Length

Max length55
Median length48
Mean length20.942471
Min length6

Characters and Unicode

Total characters71351
Distinct characters28
Distinct categories1 ?
Distinct scripts1 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique5 ?
Unique (%)0.1%

Sample

1st rowEMERGENCY ROOM - HOSPITAL
2nd rowEMERGENCY ROOM - HOSPITAL
3rd rowOFFICE
4th rowEMERGENCY ROOM - HOSPITAL
5th rowINPATIENT HOSPITAL

Common Values

ValueCountFrequency (%)
OFFICE 1055
31.0%
EMERGENCY ROOM - HOSPITAL 555
16.3%
URGENT CARE FACILITY 541
15.9%
TELEHEALTH PROVIDED OTHER THAN IN PATIENT'S HOME 380
 
11.2%
HOSPITAL OUTPATIENT 337
 
9.9%
TELEHEALTH PROVIDED IN PATIENT'S HOME 125
 
3.7%
INPATIENT HOSPITAL 83
 
2.4%
UNASSIGNED 55
 
1.6%
ON CAMPUS-OUTPATIENT HOSPITAL 54
 
1.6%
HOSPITAL INPATIENT (INCLUDING MEDICARE PART A) 40
 
1.2%
Other values (16) 182
 
5.3%

Length

2024-12-16T18:17:53.408403image/svg+xmlMatplotlib v3.9.4, https://matplotlib.org/
Histogram of lengths of the category
ValueCountFrequency (%)
hospital 1127
 
11.0%
office 1055
 
10.3%
621
 
6.1%
room 555
 
5.4%
emergency 555
 
5.4%
facility 543
 
5.3%
urgent 541
 
5.3%
care 541
 
5.3%
provided 530
 
5.2%
in 505
 
4.9%
Other values (43) 3685
35.9%

Most occurring characters

ValueCountFrequency (%)
E 8410
11.8%
6866
 
9.6%
T 6797
 
9.5%
I 5990
 
8.4%
O 5381
 
7.5%
A 4726
 
6.6%
N 3613
 
5.1%
H 3590
 
5.0%
R 3495
 
4.9%
C 3175
 
4.4%
Other values (18) 19308
27.1%

Most occurring categories

ValueCountFrequency (%)
(unknown) 71351
100.0%

Most frequent character per category

(unknown)
ValueCountFrequency (%)
E 8410
11.8%
6866
 
9.6%
T 6797
 
9.5%
I 5990
 
8.4%
O 5381
 
7.5%
A 4726
 
6.6%
N 3613
 
5.1%
H 3590
 
5.0%
R 3495
 
4.9%
C 3175
 
4.4%
Other values (18) 19308
27.1%

Most occurring scripts

ValueCountFrequency (%)
(unknown) 71351
100.0%

Most frequent character per script

(unknown)
ValueCountFrequency (%)
E 8410
11.8%
6866
 
9.6%
T 6797
 
9.5%
I 5990
 
8.4%
O 5381
 
7.5%
A 4726
 
6.6%
N 3613
 
5.1%
H 3590
 
5.0%
R 3495
 
4.9%
C 3175
 
4.4%
Other values (18) 19308
27.1%

Most occurring blocks

ValueCountFrequency (%)
(unknown) 71351
100.0%

Most frequent character per block

(unknown)
ValueCountFrequency (%)
E 8410
11.8%
6866
 
9.6%
T 6797
 
9.5%
I 5990
 
8.4%
O 5381
 
7.5%
A 4726
 
6.6%
N 3613
 
5.1%
H 3590
 
5.0%
R 3495
 
4.9%
C 3175
 
4.4%
Other values (18) 19308
27.1%

INSURANCE_TYPE
Categorical

High correlation  Imbalance 

Distinct4
Distinct (%)0.1%
Missing0
Missing (%)0.0%
Memory size3.9 KiB
COMMERCIAL
2716 
MEDICARE
612 
MEDICAID
 
74
UNSPECIFIED
 
5

Length

Max length11
Median length10
Mean length9.5987672
Min length8

Characters and Unicode

Total characters32703
Distinct characters14
Distinct categories1 ?
Distinct scripts1 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st rowCOMMERCIAL
2nd rowCOMMERCIAL
3rd rowCOMMERCIAL
4th rowCOMMERCIAL
5th rowCOMMERCIAL

Common Values

ValueCountFrequency (%)
COMMERCIAL 2716
79.7%
MEDICARE 612
 
18.0%
MEDICAID 74
 
2.2%
UNSPECIFIED 5
 
0.1%

Length

2024-12-16T18:17:53.452630image/svg+xmlMatplotlib v3.9.4, https://matplotlib.org/
Histogram of lengths of the category

Common Values (Plot)

2024-12-16T18:17:53.489854image/svg+xmlMatplotlib v3.9.4, https://matplotlib.org/
ValueCountFrequency (%)
commercial 2716
79.7%
medicare 612
 
18.0%
medicaid 74
 
2.2%
unspecified 5
 
0.1%

Most occurring characters

ValueCountFrequency (%)
C 6123
18.7%
M 6118
18.7%
E 4024
12.3%
I 3486
10.7%
A 3402
10.4%
R 3328
10.2%
O 2716
8.3%
L 2716
8.3%
D 765
 
2.3%
U 5
 
< 0.1%
Other values (4) 20
 
0.1%

Most occurring categories

ValueCountFrequency (%)
(unknown) 32703
100.0%

Most frequent character per category

(unknown)
ValueCountFrequency (%)
C 6123
18.7%
M 6118
18.7%
E 4024
12.3%
I 3486
10.7%
A 3402
10.4%
R 3328
10.2%
O 2716
8.3%
L 2716
8.3%
D 765
 
2.3%
U 5
 
< 0.1%
Other values (4) 20
 
0.1%

Most occurring scripts

ValueCountFrequency (%)
(unknown) 32703
100.0%

Most frequent character per script

(unknown)
ValueCountFrequency (%)
C 6123
18.7%
M 6118
18.7%
E 4024
12.3%
I 3486
10.7%
A 3402
10.4%
R 3328
10.2%
O 2716
8.3%
L 2716
8.3%
D 765
 
2.3%
U 5
 
< 0.1%
Other values (4) 20
 
0.1%

Most occurring blocks

ValueCountFrequency (%)
(unknown) 32703
100.0%

Most frequent character per block

(unknown)
ValueCountFrequency (%)
C 6123
18.7%
M 6118
18.7%
E 4024
12.3%
I 3486
10.7%
A 3402
10.4%
R 3328
10.2%
O 2716
8.3%
L 2716
8.3%
D 765
 
2.3%
U 5
 
< 0.1%
Other values (4) 20
 
0.1%

PATIENT_AGE_DIAGNOSED
Real number (ℝ)

High correlation  Zeros 

Distinct86
Distinct (%)2.5%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean47.144409
Minimum0
Maximum85
Zeros38
Zeros (%)1.1%
Negative0
Negative (%)0.0%
Memory size26.7 KiB
2024-12-16T18:17:53.533238image/svg+xmlMatplotlib v3.9.4, https://matplotlib.org/

Quantile statistics

Minimum0
5-th percentile3
Q129
median50
Q367
95-th percentile84
Maximum85
Range85
Interquartile range (IQR)38

Descriptive statistics

Standard deviation24.145581
Coefficient of variation (CV)0.51216214
Kurtosis-0.93776316
Mean47.144409
Median Absolute Deviation (MAD)19
Skewness-0.2823762
Sum160621
Variance583.00909
MonotonicityNot monotonic
2024-12-16T18:17:53.581243image/svg+xmlMatplotlib v3.9.4, https://matplotlib.org/
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%)
85 162
 
4.8%
1 79
 
2.3%
36 60
 
1.8%
62 59
 
1.7%
51 58
 
1.7%
52 56
 
1.6%
65 55
 
1.6%
69 54
 
1.6%
59 54
 
1.6%
72 54
 
1.6%
Other values (76) 2716
79.7%
ValueCountFrequency (%)
0 38
1.1%
1 79
2.3%
2 41
1.2%
3 30
 
0.9%
4 24
 
0.7%
5 25
 
0.7%
6 32
0.9%
7 19
 
0.6%
8 19
 
0.6%
9 20
 
0.6%
ValueCountFrequency (%)
85 162
4.8%
84 19
 
0.6%
83 25
 
0.7%
82 21
 
0.6%
81 33
 
1.0%
80 28
 
0.8%
79 42
 
1.2%
78 30
 
0.9%
77 45
 
1.3%
76 36
 
1.1%

STATE
Categorical

High cardinality  High correlation 

Distinct52
Distinct (%)1.5%
Missing0
Missing (%)0.0%
Memory size8.1 KiB
CA
460 
TX
335 
FL
301 
NY
228 
MI
 
125
Other values (47)
1958 

Length

Max length2
Median length2
Mean length2
Min length2

Characters and Unicode

Total characters6814
Distinct characters24
Distinct categories1 ?
Distinct scripts1 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique1 ?
Unique (%)< 0.1%

Sample

1st rowTX
2nd rowPA
3rd rowMS
4th rowPA
5th rowCA

Common Values

ValueCountFrequency (%)
CA 460
 
13.5%
TX 335
 
9.8%
FL 301
 
8.8%
NY 228
 
6.7%
MI 125
 
3.7%
MD 108
 
3.2%
IL 105
 
3.1%
PA 102
 
3.0%
GA 101
 
3.0%
OH 96
 
2.8%
Other values (42) 1446
42.4%

Length

2024-12-16T18:17:53.625025image/svg+xmlMatplotlib v3.9.4, https://matplotlib.org/
Histogram of lengths of the category
ValueCountFrequency (%)
ca 460
 
13.5%
tx 335
 
9.8%
fl 301
 
8.8%
ny 228
 
6.7%
mi 125
 
3.7%
md 108
 
3.2%
il 105
 
3.1%
pa 102
 
3.0%
ga 101
 
3.0%
oh 96
 
2.8%
Other values (42) 1446
42.4%

Most occurring characters

ValueCountFrequency (%)
A 1124
16.5%
C 727
10.7%
N 646
9.5%
L 523
 
7.7%
T 477
 
7.0%
M 464
 
6.8%
I 369
 
5.4%
X 335
 
4.9%
F 301
 
4.4%
Y 286
 
4.2%
Other values (14) 1562
22.9%

Most occurring categories

ValueCountFrequency (%)
(unknown) 6814
100.0%

Most frequent character per category

(unknown)
ValueCountFrequency (%)
A 1124
16.5%
C 727
10.7%
N 646
9.5%
L 523
 
7.7%
T 477
 
7.0%
M 464
 
6.8%
I 369
 
5.4%
X 335
 
4.9%
F 301
 
4.4%
Y 286
 
4.2%
Other values (14) 1562
22.9%

Most occurring scripts

ValueCountFrequency (%)
(unknown) 6814
100.0%

Most frequent character per script

(unknown)
ValueCountFrequency (%)
A 1124
16.5%
C 727
10.7%
N 646
9.5%
L 523
 
7.7%
T 477
 
7.0%
M 464
 
6.8%
I 369
 
5.4%
X 335
 
4.9%
F 301
 
4.4%
Y 286
 
4.2%
Other values (14) 1562
22.9%

Most occurring blocks

ValueCountFrequency (%)
(unknown) 6814
100.0%

Most frequent character per block

(unknown)
ValueCountFrequency (%)
A 1124
16.5%
C 727
10.7%
N 646
9.5%
L 523
 
7.7%
T 477
 
7.0%
M 464
 
6.8%
I 369
 
5.4%
X 335
 
4.9%
F 301
 
4.4%
Y 286
 
4.2%
Other values (14) 1562
22.9%

HCP_SPECIALTY
Categorical

High cardinality  High correlation 

Distinct78
Distinct (%)2.3%
Missing0
Missing (%)0.0%
Memory size11.0 KiB
FAMILY MEDICINE
777 
NURSE PRACTITIONER
551 
EMERGENCY MEDICINE
540 
INTERNAL MEDICINE
478 
PHYSICIAN ASSISTANT
400 
Other values (73)
661 

Length

Max length50
Median length43
Mean length17.525976
Min length7

Characters and Unicode

Total characters59711
Distinct characters29
Distinct categories1 ?
Distinct scripts1 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique25 ?
Unique (%)0.7%

Sample

1st rowFAMILY MEDICINE
2nd rowEMERGENCY MEDICINE
3rd rowEMERGENCY MEDICINE
4th rowPEDIATRICS
5th rowPEDIATRICS

Common Values

ValueCountFrequency (%)
FAMILY MEDICINE 777
22.8%
NURSE PRACTITIONER 551
16.2%
EMERGENCY MEDICINE 540
15.8%
INTERNAL MEDICINE 478
14.0%
PHYSICIAN ASSISTANT 400
11.7%
PEDIATRICS 207
 
6.1%
ANATOMIC/CLINICAL PATHOLOGY 68
 
2.0%
DIAGNOSTIC RADIOLOGY 44
 
1.3%
INTERNAL MEDICINE/PEDIATRICS 29
 
0.9%
OBSTETRICS & GYNECOLOGY 27
 
0.8%
Other values (68) 286
 
8.4%

Length

2024-12-16T18:17:53.670485image/svg+xmlMatplotlib v3.9.4, https://matplotlib.org/
Histogram of lengths of the category
ValueCountFrequency (%)
medicine 1908
28.3%
family 785
11.6%
emergency 573
 
8.5%
nurse 552
 
8.2%
practitioner 551
 
8.2%
internal 524
 
7.8%
physician 400
 
5.9%
assistant 400
 
5.9%
pediatrics 232
 
3.4%
pathology 79
 
1.2%
Other values (83) 749
 
11.1%

Most occurring characters

ValueCountFrequency (%)
I 8795
14.7%
E 7981
13.4%
N 5845
9.8%
C 4302
 
7.2%
A 4121
 
6.9%
M 3444
 
5.8%
R 3435
 
5.8%
3346
 
5.6%
T 3196
 
5.4%
S 2761
 
4.6%
Other values (19) 12485
20.9%

Most occurring categories

ValueCountFrequency (%)
(unknown) 59711
100.0%

Most frequent character per category

(unknown)
ValueCountFrequency (%)
I 8795
14.7%
E 7981
13.4%
N 5845
9.8%
C 4302
 
7.2%
A 4121
 
6.9%
M 3444
 
5.8%
R 3435
 
5.8%
3346
 
5.6%
T 3196
 
5.4%
S 2761
 
4.6%
Other values (19) 12485
20.9%

Most occurring scripts

ValueCountFrequency (%)
(unknown) 59711
100.0%

Most frequent character per script

(unknown)
ValueCountFrequency (%)
I 8795
14.7%
E 7981
13.4%
N 5845
9.8%
C 4302
 
7.2%
A 4121
 
6.9%
M 3444
 
5.8%
R 3435
 
5.8%
3346
 
5.6%
T 3196
 
5.4%
S 2761
 
4.6%
Other values (19) 12485
20.9%

Most occurring blocks

ValueCountFrequency (%)
(unknown) 59711
100.0%

Most frequent character per block

(unknown)
ValueCountFrequency (%)
I 8795
14.7%
E 7981
13.4%
N 5845
9.8%
C 4302
 
7.2%
A 4121
 
6.9%
M 3444
 
5.8%
R 3435
 
5.8%
3346
 
5.6%
T 3196
 
5.4%
S 2761
 
4.6%
Other values (19) 12485
20.9%

HCP_GENDER
Categorical

High correlation 

Distinct3
Distinct (%)0.1%
Missing0
Missing (%)0.0%
Memory size3.7 KiB
M-Male
1738 
F-Female
1328 
U-Unknown
341 

Length

Max length9
Median length6
Mean length7.0798356
Min length6

Characters and Unicode

Total characters24121
Distinct characters12
Distinct categories1 ?
Distinct scripts1 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st rowM-Male
2nd rowM-Male
3rd rowF-Female
4th rowF-Female
5th rowF-Female

Common Values

ValueCountFrequency (%)
M-Male 1738
51.0%
F-Female 1328
39.0%
U-Unknown 341
 
10.0%

Length

2024-12-16T18:17:53.715019image/svg+xmlMatplotlib v3.9.4, https://matplotlib.org/
Histogram of lengths of the category

Common Values (Plot)

2024-12-16T18:17:53.752564image/svg+xmlMatplotlib v3.9.4, https://matplotlib.org/
ValueCountFrequency (%)
m-male 1738
51.0%
f-female 1328
39.0%
u-unknown 341
 
10.0%

Most occurring characters

ValueCountFrequency (%)
e 4394
18.2%
M 3476
14.4%
- 3407
14.1%
a 3066
12.7%
l 3066
12.7%
F 2656
11.0%
m 1328
 
5.5%
n 1023
 
4.2%
U 682
 
2.8%
k 341
 
1.4%
Other values (2) 682
 
2.8%

Most occurring categories

ValueCountFrequency (%)
(unknown) 24121
100.0%

Most frequent character per category

(unknown)
ValueCountFrequency (%)
e 4394
18.2%
M 3476
14.4%
- 3407
14.1%
a 3066
12.7%
l 3066
12.7%
F 2656
11.0%
m 1328
 
5.5%
n 1023
 
4.2%
U 682
 
2.8%
k 341
 
1.4%
Other values (2) 682
 
2.8%

Most occurring scripts

ValueCountFrequency (%)
(unknown) 24121
100.0%

Most frequent character per script

(unknown)
ValueCountFrequency (%)
e 4394
18.2%
M 3476
14.4%
- 3407
14.1%
a 3066
12.7%
l 3066
12.7%
F 2656
11.0%
m 1328
 
5.5%
n 1023
 
4.2%
U 682
 
2.8%
k 341
 
1.4%
Other values (2) 682
 
2.8%

Most occurring blocks

ValueCountFrequency (%)
(unknown) 24121
100.0%

Most frequent character per block

(unknown)
ValueCountFrequency (%)
e 4394
18.2%
M 3476
14.4%
- 3407
14.1%
a 3066
12.7%
l 3066
12.7%
F 2656
11.0%
m 1328
 
5.5%
n 1023
 
4.2%
U 682
 
2.8%
k 341
 
1.4%
Other values (2) 682
 
2.8%

TARGET
Categorical

High correlation 

Distinct2
Distinct (%)0.1%
Missing0
Missing (%)0.0%
Memory size166.5 KiB
0
2821 
1
586 

Length

Max length1
Median length1
Mean length1
Min length1

Characters and Unicode

Total characters3407
Distinct characters2
Distinct categories1 ?
Distinct scripts1 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st row0
2nd row0
3rd row0
4th row0
5th row0

Common Values

ValueCountFrequency (%)
0 2821
82.8%
1 586
 
17.2%

Length

2024-12-16T18:17:53.791600image/svg+xmlMatplotlib v3.9.4, https://matplotlib.org/
Histogram of lengths of the category

Common Values (Plot)

2024-12-16T18:17:53.825404image/svg+xmlMatplotlib v3.9.4, https://matplotlib.org/
ValueCountFrequency (%)
0 2821
82.8%
1 586
 
17.2%

Most occurring characters

ValueCountFrequency (%)
0 2821
82.8%
1 586
 
17.2%

Most occurring categories

ValueCountFrequency (%)
(unknown) 3407
100.0%

Most frequent character per category

(unknown)
ValueCountFrequency (%)
0 2821
82.8%
1 586
 
17.2%

Most occurring scripts

ValueCountFrequency (%)
(unknown) 3407
100.0%

Most frequent character per script

(unknown)
ValueCountFrequency (%)
0 2821
82.8%
1 586
 
17.2%

Most occurring blocks

ValueCountFrequency (%)
(unknown) 3407
100.0%

Most frequent character per block

(unknown)
ValueCountFrequency (%)
0 2821
82.8%
1 586
 
17.2%

Interactions

2024-12-16T18:17:43.191076image/svg+xmlMatplotlib v3.9.4, https://matplotlib.org/
2024-12-16T18:16:38.551253image/svg+xmlMatplotlib v3.9.4, https://matplotlib.org/
2024-12-16T18:17:01.375706image/svg+xmlMatplotlib v3.9.4, https://matplotlib.org/
2024-12-16T18:17:10.888610image/svg+xmlMatplotlib v3.9.4, https://matplotlib.org/
2024-12-16T18:17:20.382890image/svg+xmlMatplotlib v3.9.4, https://matplotlib.org/
2024-12-16T18:17:47.867445image/svg+xmlMatplotlib v3.9.4, https://matplotlib.org/
2024-12-16T18:16:45.839964image/svg+xmlMatplotlib v3.9.4, https://matplotlib.org/
2024-12-16T18:17:06.114278image/svg+xmlMatplotlib v3.9.4, https://matplotlib.org/
2024-12-16T18:17:15.579924image/svg+xmlMatplotlib v3.9.4, https://matplotlib.org/
2024-12-16T18:17:27.953940image/svg+xmlMatplotlib v3.9.4, https://matplotlib.org/
2024-12-16T18:17:47.904109image/svg+xmlMatplotlib v3.9.4, https://matplotlib.org/
2024-12-16T18:16:48.619732image/svg+xmlMatplotlib v3.9.4, https://matplotlib.org/
2024-12-16T18:17:06.152908image/svg+xmlMatplotlib v3.9.4, https://matplotlib.org/
2024-12-16T18:17:15.619262image/svg+xmlMatplotlib v3.9.4, https://matplotlib.org/
2024-12-16T18:17:30.804058image/svg+xmlMatplotlib v3.9.4, https://matplotlib.org/
2024-12-16T18:17:47.943689image/svg+xmlMatplotlib v3.9.4, https://matplotlib.org/
2024-12-16T18:16:51.378235image/svg+xmlMatplotlib v3.9.4, https://matplotlib.org/
2024-12-16T18:17:06.195147image/svg+xmlMatplotlib v3.9.4, https://matplotlib.org/
2024-12-16T18:17:15.659597image/svg+xmlMatplotlib v3.9.4, https://matplotlib.org/
2024-12-16T18:17:33.430883image/svg+xmlMatplotlib v3.9.4, https://matplotlib.org/
2024-12-16T18:17:52.630021image/svg+xmlMatplotlib v3.9.4, https://matplotlib.org/
2024-12-16T18:16:58.597760image/svg+xmlMatplotlib v3.9.4, https://matplotlib.org/
2024-12-16T18:17:10.848752image/svg+xmlMatplotlib v3.9.4, https://matplotlib.org/
2024-12-16T18:17:20.340551image/svg+xmlMatplotlib v3.9.4, https://matplotlib.org/
2024-12-16T18:17:40.511052image/svg+xmlMatplotlib v3.9.4, https://matplotlib.org/

Correlations

2024-12-16T18:17:53.853419image/svg+xmlMatplotlib v3.9.4, https://matplotlib.org/
HCP_GENDERHCP_IDHCP_SPECIALTYINSURANCE_TYPENUM_CONDITIONSNUM_CONTRAINDICATIONSPATIENT_AGE_DIAGNOSEDPATIENT_GENDERPATIENT_IDSTATETARGETTXN_LOCATION_TYPE
HCP_GENDER1.0000.1910.3610.0000.0310.0070.0310.0531.0000.1100.0360.170
HCP_ID0.1911.0000.1930.0860.0760.0390.0360.0580.0310.1920.0920.150
HCP_SPECIALTY0.3610.1931.0000.2320.1350.0000.2150.1061.0000.0790.1810.321
INSURANCE_TYPE0.0000.0860.2321.0000.1830.0470.3580.0001.0000.1410.0790.269
NUM_CONDITIONS0.0310.0760.1350.1831.0000.4230.6120.0000.3310.0500.1270.084
NUM_CONTRAINDICATIONS0.0070.0390.0000.0470.4231.0000.4100.0340.2720.0000.0120.069
PATIENT_AGE_DIAGNOSED0.0310.0360.2150.3580.6120.4101.0000.0880.3740.0480.2410.120
PATIENT_GENDER0.0530.0580.1060.0000.0000.0340.0881.0001.0000.0000.0100.000
PATIENT_ID1.0000.0311.0001.0000.3310.2720.3741.0001.0001.0001.0001.000
STATE0.1100.1920.0790.1410.0500.0000.0480.0001.0001.0000.1340.108
TARGET0.0360.0920.1810.0790.1270.0120.2410.0101.0000.1341.0000.203
TXN_LOCATION_TYPE0.1700.1500.3210.2690.0840.0690.1200.0001.0000.1080.2031.000

Missing values

2024-12-16T18:17:52.678391image/svg+xmlMatplotlib v3.9.4, https://matplotlib.org/
A simple visualization of nullity by column.
2024-12-16T18:17:52.751590image/svg+xmlMatplotlib v3.9.4, https://matplotlib.org/
Nullity matrix is a data-dense display which lets you quickly visually pick out patterns in data completion.

Sample

PATIENT_IDPATIENT_GENDERNUM_CONDITIONSNUM_CONTRAINDICATIONSTXN_DTHCP_IDTXN_LOCATION_TYPEINSURANCE_TYPEPATIENT_AGE_DIAGNOSEDSTATEHCP_SPECIALTYHCP_GENDERTARGET
01M-Male102022-06-1124633EMERGENCY ROOM - HOSPITALCOMMERCIAL34TXFAMILY MEDICINEM-Male0
12M-Male102022-06-227777EMERGENCY ROOM - HOSPITALCOMMERCIAL2PAEMERGENCY MEDICINEM-Male0
23M-Male102022-06-2017051OFFICECOMMERCIAL49MSEMERGENCY MEDICINEF-Female0
34M-Male102022-06-3019478EMERGENCY ROOM - HOSPITALCOMMERCIAL0PAPEDIATRICSF-Female0
47M-Male102022-06-068189INPATIENT HOSPITALCOMMERCIAL1CAPEDIATRICSF-Female0
58M-Male102022-06-2821499EMERGENCY ROOM - HOSPITALCOMMERCIAL45WVDIAGNOSTIC RADIOLOGYM-Male0
69M-Male202022-06-30841HOSPITAL OUTPATIENTCOMMERCIAL31MEEMERGENCY MEDICINEU-Unknown0
710F-Female102022-06-0112379EMERGENCY ROOM - HOSPITALCOMMERCIAL74FLNURSE PRACTITIONERF-Female0
811M-Male202022-06-2311336HOSPITAL OUTPATIENTMEDICARE32FLEMERGENCY MEDICINEM-Male0
912M-Male102022-06-2210655EMERGENCY ROOM - HOSPITALCOMMERCIAL2ALPEDIATRIC EMERGENCY MEDICINE (PEDIATRICS)M-Male0
PATIENT_IDPATIENT_GENDERNUM_CONDITIONSNUM_CONTRAINDICATIONSTXN_DTHCP_IDTXN_LOCATION_TYPEINSURANCE_TYPEPATIENT_AGE_DIAGNOSEDSTATEHCP_SPECIALTYHCP_GENDERTARGET
33974007M-Male302022-06-1212091URGENT CARE FACILITYCOMMERCIAL25WAEMERGENCY MEDICINEM-Male0
33984008F-Female17192022-06-2412194TELEHEALTH PROVIDED OTHER THAN IN PATIENT'S HOMECOMMERCIAL59CAEMERGENCY MEDICINEM-Male1
33994009F-Female1202022-06-1416999TELEHEALTH PROVIDED IN PATIENT'S HOMEMEDICARE72CAFAMILY MEDICINEM-Male1
34004010M-Male9612022-06-0620038TELEHEALTH PROVIDED IN PATIENT'S HOMEMEDICARE74CAINTERNAL MEDICINEM-Male1
34014012F-Female402022-06-293491OFFICEMEDICARE73ILNURSE PRACTITIONERF-Female1
34024015M-Male2202022-06-205692OTHER PLACE OF SERVICECOMMERCIAL72CANURSE PRACTITIONERU-Unknown1
34034016F-Female7162022-06-0315294OFFICEMEDICARE75NJINTERNAL MEDICINEF-Female0
34044017F-Female64542022-06-2111575OFF CAMPUS-OUTPATIENT HOSPITALMEDICARE85ILFAMILY MEDICINEF-Female1
34054019M-Male15392022-06-075402EMERGENCY ROOM - HOSPITALCOMMERCIAL64TXEMERGENCY MEDICINEF-Female1
34064020F-Female2912022-06-2717966OFFICECOMMERCIAL81WIFAMILY MEDICINEM-Male1